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CONFIDENT  I  IHITS  ON  PREDICTED  VALUES 


PURPOSE:  To  determine  the  probable  accuracy  of  a  predicted  value. 

GENERAL:  Values  used  in  engineering  design  are  often  predicted  using  data 
from  laboratory  experiments  or  field  measurements.  A  common  means  used 
is  to  statistically  fit  a  regression  line  to  plotted  data.  This  regres¬ 
sion  line  will  give  a  predicted  average  value,  i.e.,  the  real  value  has 
an  equal  probability  of  being  above  or  below  the  predicted  value.  It  is 
desirable  to  have  a  confidence  limit  on  the  real  value.  This  confidence 
limit  around  the  predicted  value  will  provide  a  range  of  values  within 
which  there  is  a  given  probability  of  finding  the  real  value. 

Taking  the  independent  variable  as  x  and  the  dependent  variable  (to 
be  predicted)  as  y,  a  data  point  is  given  as  ^x^,  y^  .  For  a  given 
value  of  the  independent  variable,  x^,  the  predicted  value  of  the  depend¬ 
ent  variable  is  yfc  and  the  real  value  is  yfc.  Regression  lines  giving 
a  formula  relating  y  to  x,  such  as: 

y  =  a  +  bx  (1) 

for  linear  regression,  can  be  obtained  by  established  means  (see  Draper 
and  Smith,  1966;  Mendenhall,  1966;  Daniel  and  Wood,  1971;  or  Walpole 
and  Myers,  1978).  Various  manufacturers  produce  programable  calculators 
which  have  standard  programs  for  linear  regression. 

Care  must  be  exercised  in  applying  regression  analysis  and  confi¬ 
dence  limits  to  particular  sets  of  data.  The  data  sets  should  be  plotted 
to  determine  if  the  relationships  appear  to  be  linear  or  currilinear. 

Also,  there  must  be  a  clear  physical  relationship  between  the  dependent 
and  the  independent  variables.  Some  of  the  references  discuss  the 
underlying  assumptions  and  the  precautions  which  should  be  observed  in 
applying  regression  analysis. 
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Where  linear  regression  is  used  for  predictions,  the  confidence  limits 
can  be  attained  as  follows: 


2 » 


For  a  given  probability.  Pi  that  y  <_  ^y  +  egt^ ,  or  probability,  P5 
that  ^y  -  £  y  £  +  est)'  C  °^,ta^ne^  from  the  table  of  t  values. 

Table  1  has  been  developed  from  the  t-distribution  (see  Draper  and  Smith, 
1966)  and  may  be  used  directly  to  obtain  probabilities  for  the  example  pro¬ 
blem  to  be  shown.  (Note  that  Table  1  is  not  a  t-distribution  table  since 
one  enters  the  table  directly  with  the  number  of  data  points  rather  than 
with  the  number  of  data  points  less  two.  This  assumes  that  a  t-distribution 
with  two  degrees  of  freedom  is  appropriate  for  the  solution)  .  The  value 


e  is  the  standard  error  and  for  a  given  predicted  value  y  is  given  as: 
s 


e 

s 


(2) 


where  y^  is  the  predicted  value  of  y  corresponding  to  the  data  point  ~ 

(x.,  y±j  ,  n  is  the  number  of  data  points,  x  the  mean  value  of  x^,  and  x^ 

is  the  value  of  x  for  which  we  are  predicting  y^.  For  a  given  probability, 

the  confidence  limits  for  the  various  values  of  y,  will  form  curves  above 

k 

and  below  the  regression  line  defining  the  values  of  y^. 


*****************  EXAMPLE  ***************** 


GIVEN :  At  a  hypothetical  coastal  location,  there  exists  a  150-year  record 
of  tsunami  flood  levels.  A  total  of  seven  tsunamis  occurred  having  mea¬ 
sured  flood  levels  above  mean  sea  level  of  4  feet,  5  feet,  5.5  feet,  7  feet, 
8  feet,  12  feet,  and  16  feet.  All  flood  levels  are  for  a  location  200  feet 
shoreward  of  the  coastline. 

FIND:  A  flood  level  that  has  a  99  percent  probability  of  not  being  ex¬ 
ceeded  by  the  100-year  tsunami. 
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Table  1.  t  Values 


number 
of 
data 
points , 

•  n 

0.55 

0.65 

0.75 

Probability  P 

0.85  0.9 

^  that 
0.95 

y  <  y  + 

0.975 

egt 

0.99 

0.995 

0.9995 

0.1 

0.3 

Probability  that  y  - 

0.5  0.7  0.8  0.9 

V  -  y 

0.95 

<  y  +  e 
—  s 

0.98 

t 

0.99 

0.999 

3 

0.158 

0.510 

1.000 

1.963 

3.078 

6.314 

12.706 

31.821 

63.657 

636.619 

4 

0.142 

0.445 

0.816 

1.386 

1.886 

2.920 

4.303 

6.965 

9.925 

31.598 

5 

0.137 

0.424 

0.765 

1.250 

1.638 

2.353 

3.182 

4.541 

5.841 

12.924 

6 

0.134 

0.414 

0.741 

1.190. 

1.533 

2.132 

2.776 

3.747 

•  4.604 

8.610 

7 

0.132 

0.408 

0.727 

1.156 

1.476 

2.015 

2.571 

3.365 

4.032 

6.869 

8 

0.131 

0.404 

0.718 

1.134 

1.440 

1.943 

2.447 

3.143 

3.707 

5.959 

9 

0.130 

0.402 

0.711 

1.119 

1.415 

1.895 

2.365 

2.998 

3.499 

5.408 

10 

0.13Q 

0.399 

0.706 

1.108 

1.397 

1.360 

2.306 

2.896 

3.355 

5.041 

11 

0.129 

0.398 

0.703 

1.100 

1.383 

1 .833 

2.262 

2.821 

3.250 

4.781 

12 

0.129 

0.397 

0.700 

1.093 

1.372 

1.812 

2.228 

2.764 

3.169 

4.587 

13 

0.129 

0.396 

0.697 

1.088 

1.363 

1.796 

2.201 

2.718 

3.106 

4.437 

14 

0.128 

0.395 

0.695 

1.083 

1.356 

1.782 

2.179 

2.681 

3.055 

4.318 

15 

0.128 

0.394 

0.694 

1.079 

1.350 

1.771 

2.160 

2.650 

3.012 

4.221 

16 

0.128 

0.393 

0.692 

1.076 

1.345 

1.761 

2.145 

2.624 

2.977 

3.140 

17 

0.128 

0.393 

0.691 

1.074 

1.341 

1.753 

2.131 

2.602 

2.947 

4.073 

18 

0.128 

0.392 

0.690 

1.071 

1.337 

1.746 

2.120 

2.583 

2.921 

4.015 

19 

0.128 

0.392 

0.689 

1.069 

1.333 

1.740 

2.110 

2.567 

2.898 

3.965 

20 

0.127 

0.392 

0.688 

1.067 

1.330 

1.734 

2.101 

2.552 

2.878 

3.922 

21 

0.127 

0.391 

0.688 

1.066 

1.328 

1.729 

2.093 

2.539 

2.861 

3 , 883 

22 

0.127 

0.391 

0.687 

1.C64 

1.325 

1.725 

2.086 

2.528 

2.845 

3.850 

23 

0.127 

0.391 

0.686 

1.063 

1.323 

1.721 

2.080 

2.518 

2.831 

3.819 

24 

0..127 

0.390 

0.686 

1.061 

1.321 

1.717 

2. 074 

2.508 

2.819 

3.792 

25 

0.127 

0.390 

0.685 

1.060 

1.319 

1.714 

2.069 

2.500 

2.807 

3.767 

26 

0.127 

0.390 

0.685 

1.059 

1.318 

1.711 

2.064 

2.492 

2.797 

3.745 

27 

0.127 

0>.  390 

0.684 

1.058 

1.316 

1.708 

2.060 

2.485 

2.787 

3.725 

28 

0.127 

0.390 

0.684 

1.058 

1.315 

1.706 

2.056 

2.479 

2.779 

3.707 

29 

0.127 

0.389 

0.684 

1.057 

1.314 

1.703 

2.052 

2.473 

2.771 

3.690 

30 

0.127 

0.389 

0.683 

1.056 

1.313 

1.701 

2.048 

2.467 

2.763 

3.674 

40 

0.126 

0.388 

0.681 

1.051 

1.304 

1.686 

2.025 

2.429 

2.710 

3.564 

50 

0.126 

0.388 

0.680 

1.048 

1.299 

1.678 

2.012 

2.407 

2.684 

3.510 

100 

0.126 

0.386 

0.677 

1.041 

1.289 

1.659 

1.982 

2.365 

2.627 

3.393 

CD 

0.126 

0.385 

0.674 

1.036 

1.282 

1.645 

1.960 

2.326 

2.576 

3.291 
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SOLUTION :  The  probability  of  occurrence  of  each  flood  level,  h,  is  given 
by  P(h)  =  m/ (n  +  1)  where  m  is  the  rank  and  n  is  the  period  of  record  in 
years.  This  gives: 

P (16)  =  1/(150  +  1)  =  0.0066 

P  (12)  =  2/(150  +  1)  =  0.0132 

P (8)  =  3/(150  +  1)  =  0.0199 

P (7)  =  4/(150  +  1)  =  0.0265 

P(5.5)=  5/(150  +  1)  =  0.0331 
P(5)  =  6/(150  +  1)  =  0.0397 

P(4)  =  7/(150  +  1)  =  0.0464 

Flood  levels  are  plotted  against  probability  in  the  Figure. 

Using  the  equation  for  tsunami  flooding  given  by  Houston,  et  al.,(1977)  and 
Camfield  (1980)  that 

h  =  -B  -  A  logio  P(h)  ,  (3) 

linear  regression  from  standard  methods  (references  2,3,5)  gives 

h  =  -15.58  -  14.42  log10  P(h)  .  (4) 


0.001  0.01  0.1 


P(h) 

Figure  1.  Probability  of  occurrence  of  tsunami  flood  levels. 

Note:  This  figure  is  only  applicable  to  the  given  set  of  data 
and  should  not  be  used  for  design  purposes. 
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For  P(h)  =  0.01  (the  100-year  tsunami),  from  equation  (4) 

h  =  -15.58  -  14.42  logl  0  (0.01)  =  -15.58  -  14.42  (-2) 
h  =  28.84  -  15.58  =  13.26  feet 


which  is  the  predicted  flood  level  from  linear  regression.  To  calculate 
eg,  note  that  the  independent  variable  "x"  is  logj Q  P(h)  where  P(h)  has 
been  previously  calculated,  and  the  dependent  variable  "y"  is  h. 


e 


s 


^1  - 

x  (log10  P(h)k  -  Iog10  P(h))2 

n  -  2 

n  X(log10  P(h)i  -  log,,  p(h))‘ 

(5) 


n  =  7  (the  number  of  data  points) 

The  term  [  Z(hi  -  6i)/(n  -  2  >]'  2  is  obtained  directly  from  existing  pro¬ 
grams  for  linear  regression  available  on  programable  calculators  (indi¬ 
vidual  users  should  refer  to  manufacturers  handbooks  for  the  calculators 
that  they  have  available).  In  this  case,  h  =16,  h  =12,  h  =  8, 
h,  =  7,  h  =  5.5,  h  =  5,  h,  =  4  then 

HO  b  / 


0.5579 


(Note:  working  this  problem  without  a  programable  calculator  is  a  long, 
tedious  process  which  is  beyond  the  scope  of  this  technical  note) . 

Equation  (5)  for  the  standard  error  now  becomes 


e 

s 


0.5579 


(log10  0.01  -  log10  P(h)): 


l(log10  P(h)±  -  log10  P(h))5 


Using  a  programable  calculator,  and  inserting  values  of  h  and  log10  P(h), 
the  mean  value  of  log10  P(h)  is 


loglo  P(h)  =  -1.6501 


5 


and  the  summation  is  given  as 


z(log10  P(h)j.  -  log10  P(h)]2 3 4 5 6  =  0.5307 

The  standard  error  is  now 

e  =  0.5579 

s 

e  =  0.341 

s 

To  solve  for  h,  for  the  100— year  tsunami 

h  <  h  +  e  t 
—  s 

from  the  table,  for  n  =  7,  and  P1  =  0.99,  t  =  3.365 

h  <  13.26  +  0.341(3.365)  =  13.26  +  1.14 
h  <:  14.4  ft. 

There  is  a  99  percent  probability  that  the  100-year  tsunami  will  not  exceed 
a  flood  level  of  14.4  feet.  To  obtain  a  confidence  limit  curve  (or  the  99 
percent  confidence  limit  line  in  Figure  1),  one  repeats  the  computation  for 
a  number  of  frequency  intervals . 
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